Parallel evolution despite low genetic diversity in three-spined sticklebacks

When populations repeatedly adapt to similar environments they can evolve similar phenotypes based on shared genetic mechanisms (parallel evolution). The likelihood of parallel evolution is affected by demographic history, as it depends on the standing genetic variation of the source population. The three-spined stickleback (Gasterosteus aculeatus) repeatedly colonized and adapted to brackish and freshwater. Most parallel evolution studies in G. aculeatus were conducted at high latitudes, where freshwater populations maintain connectivity to the source marine populations. Here, we analysed southern and northern European marine and freshwater populations to test two hypotheses. First, that southern European freshwater populations (which currently lack connection to marine populations) lost genetic diversity due to bottlenecks and inbreeding compared to their northern counterparts. Second, that the degree of genetic parallelism is higher among northern than southern European freshwater populations, as the latter have been subjected to strong drift due to isolation. The results show that southern populations exhibit lower genetic diversity but a higher degree of genetic parallelism than northern populations. Hence, they confirm the hypothesis that southern populations have lost genetic diversity, but this loss probably happened after they had already adapted to freshwater conditions, explaining the high degree of genetic parallelism in the south.


Introduction
Parallel evolution, defined as the evolution of similar phenotypes based on shared genetic mechanisms [1][2][3], takes place when different species or populations repeatedly adapt to similar environments.Studying parallel evolution is central to understanding local adaptation, as repeated phenotypic changes within similar environments directly imply the role of natural selection in the evolution of these traits [4].Theoretical [5], experimental [6] and empirical [7,8] studies have demonstrated that selection on adaptive alleles which are identical by descent and harboured as standing genetic variation (SGV) in the source population is one of the most important pathways to rapid parallel evolution.Indeed, many of the most spectacular examples of rapid parallel evolution of ecotypes within species can be explained by selection on SGV whose origin is far more ancient than the newly colonized populations [9][10][11][12].
The likelihood of parallel evolution from SGV is influenced by the effective population size (N e ) of both the source population and the populations invading the new environment [5,13,14], gene flow [15,16], and is expected to be higher for large-than smalleffect-loci [17].This is because in small populations potentially beneficial alleles are more likely to be stochastically lost, and the effects of genetic drift [18,19] or migration [20] may override the effects of natural selection, lowering the probability of adaptation, particularly for loci of small effect [16,21].Understanding how demographic factors affect the likelihood of parallel evolution is also important to understand the evolutionary potential of populations.Both processes are limited by available SGV and the same demographic factors that reduce the likelihood of parallel evolution restrict the potential for local adaptation [16].
The three-spined stickleback (Gasterosteus aculeatus L.) is a model organism for studies of local adaptation, especially for the study of parallel evolution [15,[22][23][24].The contemporary species range extends across most of the northern hemisphere [25][26][27].Three-spined sticklebacks originated in the northern Pacific [28][29][30], from where they colonized the Atlantic Ocean and then the Baltic Sea, the Mediterranean Sea and the Black Sea [30].Within all these different biogeographic regions, marine stickleback populations successfully invaded different brackish and freshwater habitats, evolving similar morphological traits as adaptations to life in low salinity environments [26,[31][32][33].These adaptations involve several traits, such as body armour and lateral-plate reduction in freshwater populations [34], male courtship behaviour [35], body shape, physiological adaptations to different salinities and trophic specialization [31].
The high prevalence of multi-trait parallel evolution in this species has been attributed to the mechanism described by the 'transporter hypothesis' [1].The hypothesis posits that adaptive alleles in freshwater populations are transported to marine populations, where they are maintained at low frequencies via recurrent gene flow.Recombination in the marine population weakens the correlation among freshwater-adapted alleles, disintegrating the freshwater haplotype.As new streams are formed and colonized from the sea, selection and recombination reassemble the freshwater-adapted haplotype in the newly colonized location, allowing rapid responses to selection on ecological timescales [12].Recent studies, however, demonstrated that this parallelism is not universal and is constrained by demographic history, especially by effective population size (N e ) and gene flow [3,16,[36][37][38][39].Most notably, it has been shown that signatures of parallel evolution in this species are strongest in the Eastern Pacific region and lower in other parts of the world [15,37,38].
During Pleistocene glaciations, three-spined stickleback populations inhabiting high-latitude areas of Europe were eradicated, whereas populations residing in (or moving to) the south persisted in refugia [27].After the retreat of the ice sheets covering northern Europe, the high-latitude areas became recolonized by migration from the south, and hence, today's northern European populations are relatively young [27,30,38,40,41].These contrasting histories between high and low latitude populations explain some of the clear differences we see today in both genetic diversity and differentiation among populations from these latitudinal extremes.
Population genetic studies of G. aculeatus have been focused on high-latitude areas, where freshwater populations are typically less than 10 000 years old.Few studies have focused on lower-latitude areas [27,40,42,43], and these have used a limited number of microsatellite markers and mtDNA.As genetic diversity usually decreases moving away from the centre of origin of a range expansion [44], there are good reasons to believe that southern populations (some of which may be over 40 000 years old [30,41,45]) host most of the ancestral polymorphism that fuelled subsequent expansions to the north.Indeed, this pattern is observed in many European species that had glacial refugia in Southern Europe [46,47].However, recent work on sticklebacks from the Adriatic Sea area revealed lower genetic diversity than in northern Europe [45].
Nowadays European marine three-spined stickleback populations extend from the Bay of Biscay through the north up to the Barents Sea and can be also be found in the Black Sea, but remain absent from the Iberian Atlantic coast and the Mediterranean Sea, with the exception of certain coastal lagoon areas [48].On the other hand, freshwater populations have a continuous distribution from Mediterranean coastal areas up to the Barents Sea [25,32,42].Consequently, current freshwater populations from southern Europe are unlikely to be in contact with the marine source population(s).This has likely subjected the southern European populations to stronger genetic drift, and erased part of the ancestral polymorphism.Indeed, southern European freshwater populations show a high degree of genetic differentiation over very short geographic distances [42,49].
This study had two main aims.First, to investigate whether the southern European freshwater populations (mainly in the Iberian Peninsula and around the Mediterranean basin) of the three-spined stickleback-which currently lack or have limited connection to marine populations-have lost genetic diversity due to population bottlenecks and inbreeding compared to their northern European counterparts ( populations from Fennoscandia).Second, to compare the extent of parallel evolution in southern versus northern European freshwater populations.To this aim we focused on specific genomic regions which have been shown to be consistently associated with freshwater colonization across a very broad geographic range in earlier studies [15,23] and display high linkage disequilibrium (LD) [15].Under the assumption that the lack of continued access for freshwater populations to SGV in the ancestral marine population reduces the likelihood of parallel evolution, we hypothesized that the degree of genetic parallelism (here defined as the proportion of individuals exhibiting freshwater adapted haplotypes) in genomic regions subject to positive selection in freshwater environments is lower in southern than in northern European populations.If, however, a reduction in genetic diversity and/or cessation of gene flow between southern European freshwater and marine populations occurred following freshwater adaptation, we expected a lower degree of genetic parallelism in northern than southern European populations.

Methods (a) Sampling design and sample collection
In total, the sampling design included 257 individuals from 36 populations (figure 1).
We obtained 149 samples from 11 populations from the Adriatic Sea basin and the Iberian Peninsula between 2003 and 2015.All samples were collected during the local breeding seasons using mist-nets, minnow traps or electric fishing, under appropriate permits (see Ethics).
royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 291: 20232617 We also retrieved previously published RAD-seq data from an additional 118 individuals: 14 previously sequenced individuals collected from marine and freshwater Alaskan populations from Nelson and Cresko [12], 27 individuals from several European locations previously sequenced by Fang et al. [30], and 67 individuals from European and eastern Russian locations sequenced by Fang et al. [3].To study genetic parallelism, samples were classified as belonging to one of the following biogeographic regions [30]: Adriatic Sea, Alaska, Eastern Russia, English Channel, Fennoscandia, Gulf of Finland, Iberian Peninsula, Mur River and North Sea.The Mur River population is geographically close to the Adriatic Sea populations, but it was classified in a different biogeographic region because it belongs to the Danube Basin.Details of samples with their respective population names, geographic coordinates, country, habitat type, sampling years, DNA extraction methods and source of the sequence information are given in electronic supplementary material, tables S1 and S3.

(b) Laboratory methods (i) Sequencing
Genomic DNA was extracted from alcohol-preserved samples using different methods (see electronic supplementary material, table S1), and extracted DNA was shipped to BGI Hong Kong for library preparation.Library preparation and sequencing details for Alaskan samples can be found in Nelson & Cresko [12].For all other samples the methods were identical to those in Fang et al. [30].Briefly, restriction site associated DNA sequencing (RAD-seq) [50,51] was performed using the PstI enzyme and a P1 adaptor with unique barcodes.Forward amplification and sequencing primers were ligated.Ligation products were randomly sheared, and size selected using a (ii) Data preparation PCR duplicates were removed with the clone_filter program, part of the Stacks pipeline [52].The remaining read pairs were mapped to the Broad S1 reference genome (release 101), retrieved from the Ensembl database [53], using BWA v.0.7.17 [54] and SAMtools v. 1.10 [55].Given that we observed some heterogeneity in sequencing depth among samples, we performed most analyses using genotype likelihoods, although some analyses required hard-called genotypes.Variants were called using GATK v.4.0.1.2following GATK best practices workflow [56].A gVCF was created for each individual using HaplotypeCaller, all gVFCs were combined using CombinegVCFs, and a VCF with genotype likelihoods and genotype calls was produced using GenotypeVCF.The VCF was filtered to retain only biallelic SNPs with less than 25% missing data and a genotype quality above 20, resulting in a dataset of 16 040 046 SNPs.For a detailed description of the data analysis pipeline, please refer to the annotated scripts deposited in GitHub (https://github.com/Nopaoli/3s-Parallel-Evolution)and in Dryad [57].

(iii) Genetic diversity
We estimated observed individual heterozygosity (H O ) and nucleotide diversity (π) [58] from individual and population site frequency spectra generated with ANGSD, using the scripts from Momigliano et al. [59].We analysed runs of homozygosity (ROH) [60], long stretches of homozygous genotypes arising at a locus owing to identity-by-descent, to provide insight on the demographic history of the sampled populations [61].We calculated ROH from the input VCF file, further filtered to retain only loci with <10% missing data and a minor allele frequency (MAF) above 0.01 (resulting in 138 972 SNPs), using genotype likelihoods (PL) and plotted the number of ROH (N ROH ) against the sum of ROH (S ROH ) for each individual.F ROH , defined as the proportion of the autosomal genome covered by ROH, was calculated using ROH of all size classes, so no length cut-off was applied.ROH reflect population size, as small populations have more and larger ROH than larger ones (high N ROH and S ROH ).Consanguinity, on the other hand, generally results in less numerous but longer ROH [61].
We tested the correlation between habitat type and latitude with genetic diversity using linear models where both predictors were fitted to π and mean F ROH .The intraclass correlation coefficient (ICC = 0.45 < 0.5) indicated there was no strong correlation between the predictors.In addition, a one-way ANOVA was performed to test for significant differences in π across geographic regions.Geographic locations with less than two observations were removed (English Channel).A post-hoc analysis with Tukey-Kramer test was performed to check which pairs of geographic locations showed significant differences in π.For all analyses diagnostic plots were examined to check that the assumptions of normality of residuals and homoscedasticity were met.

(iv) Population structure
Pairwise F ST estimates were obtained with the R package diveRsity [62].For this we used the same data as per ROH analysis, but thinned so that adjacent SNPs were no closer than 10 000 bp, reducing the number of sites from 138 972 to 27 614.Populations with fewer than two individuals after filtering (ENG, NO-KRI, NO-SWE, AD-ITA and FE-BAR) were removed.
A PCA using genome-wide SNPs was performed using plink v.1.90(Purcell et al. [63]), excluding individuals with average depth <2, retaining only bi-allelic SNPs with genotype quality above 20 and <25% of missing data.This filtering reduced the dataset from 257 to 230 individuals (electronic supplementary material, table S2) and from 16 040 046 to 1 492 678 sites.

(v) Demographic history
We estimated one dimensional folded site frequency spectra using ANGSD (using the parameter flags -uniqueOnly 1 -remove_bads 1 -min-MapQ 20 -minQ 20 -C 50), and reconstructed changes in N e in southern and northern European populations using stairwayplot2 [64] using the same parameters as Dahms et al. [45].Regions including chromosomal inversions as well as all LD clusters involved in parallel evolution were removed.

(vi) Genetic parallelism
To study the degree of genetic parallelism among the stickleback populations we focused on twelve LD clusters (i.e.groups of loci forming high LD networks) [65] associated with freshwater adaptation across the Atlantic and Pacific regions [3] (LD clusters 5, 6, 10, 11, 12, 13, 16, 18, 20, 22, 25 and 27), which cover 58% of the regions involved in parallel evolution identified by Jones et al. [23].A principal component analysis (PCA) based on loci for each of these LD clusters is expected to separate individuals showing the marine and freshwater adapted genotypes along the 1st PC [3], as for these LD clusters the freshwater adapted alleles are nearly fixed in both Eastern Pacific and Eastern Russia freshwater populations.Therefore, we can use a quadratic discriminant analysis based on the computed PCs trained using Eastern Russian and Alaskan populations to assign, for each LD Cluster, the rest of individuals to a 'freshwater' or 'marine' haplotype.
Before the analyses we split two clusters that spanned large non-contiguous regions (clusters 11 and 27) into five smaller contiguous clusters (Clusters 11a, b, c and Clusters 27a and b).Furthermore, as patterns of LD can be distinct in different geographic regions, we ran linkage disequilibrium network analyses (LDna) [65] to retain, within each cluster, only the group of loci showing the strongest parallel changes in allele frequencies across marine-freshwater habitats (see relevant section in the electronic supplementary material, figure S1).Information on the chromosome, number of loci and genomic coordinates of each LD cluster within the Broad S1 reference genome (release-101) are given in electronic supplementary material, table S4.
We then performed PCAs for each cluster using two different methods.For the first method, for each LD cluster we estimated the covariance matrix directly from genotype likelihoods using PCAngsd [66], filtering for minimum allele frequency of 0.05.We then imported the covariance matrices in R using the RcppCNPy library [67] and calculated eigenvalues and eigenvectors.
For the second method, we performed a PCA from hard-called data with the Pacific samples (marine populations: RABBIT-SLOUGH, RUS-AN, RUS-ASH, RUS-KHA, freshwater populations: BEAR-PAW-LAKE, BOOT-LAKE) and projected the European samples on the PCA space.This method was employed to ensure that southern European populations, which are the most abundant in the dataset, do not exert a stronger influence on the PCA space than other samples.The hard-called data were generated by firstly calculating the individual allele frequencies and posterior genotype probabilities from the beagle files for each LD cluster using PCAngsd.Second, royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 291: 20232617 genotypes were called from posterior genotype probabilities incorporating the individual allele frequencies as prior information.When performing the PCA using the Pacific samples, we removed loci with >20% missing data and replaced remaining missing data with the population mean.To project the European samples on the PCA space we removed the loci that had been removed in the Pacific samples and replaced the NA values following the same criteria as previously described.
For each PCA approach, we used a quadratic discriminant analysis (QDA, implemented in the R package MASS [68]) on the principal components (PCs) of each LD cluster to classify individual genotypes as marine or freshwater adapted [68].Since the degree of genetic parallelism is highest in Pacific populations of three-spined stickleback and we studied LD clusters known to be associated with genetic parallelism at the global scale, we used our Pacific populations (marine populations: RABBIT-SLOUGH, RUS-AN, RUS-ASH and RUS-KHA, freshwater populations: BEAR-PAW-LAKE and BOOT-LAKE) as a training set to classify European individuals.For each LD cluster we built four models, each one adding a new PC: in the first model we tested the PC1, in the second model the PC1 + PC2, in the third model the PC1 + PC2 + PC3 and in the fourth PC1 + PC2 + PC3 + PC4.We determined the accuracy of each of the models using leave-one-out cross-validation on the training dataset, and retained clusters with an accuracy>0.75.After determining the accuracy of the models, we plotted the results of the model with the highest accuracy for each LD cluster.The resulting test gives for each individual and LD cluster an assignment score ranging from 0 (marine genotype) to 1 (freshwater genotype), which we refer to as the Probability of carrying the Freshwater Genotype (PFG).It is important to keep in mind the limitations of the QDA approach that was followed.First, because Pacific populations were used as a training set, what is being specifically compared is the similarity to Pacific haplotypes.Second, the results are limited to regions that are considered globally parallel and may not reflect local parallelism within the European context.The possibility of identifying genomic regions of parallelism unique to the European populations was precluded by the SNP data resolution of our samples and the absence of marine counterparts in the south.
Finally, we generated a parallelism index (PI) for each population.For each LD cluster, we generated a PI by taking the average PFG across individuals (a measure of the proportion of individuals that have the freshwater-adapted haplotype).We then estimated the mean PI for each population across all LD clusters.We used a one-way ANOVA to test for differences in PI across the three European regions: Northern Europe, Iberian Peninsula and Adriatic, and used a Tukey's honest significance test for pairwise comparisons.It's important to note that the (PI) of a population as measured using PFG might represent different genomic regions across individuals.

(a) Population structure
The pairwise F ST matrix showed clear geographic patterns of genetic differentiation (electronic supplementary material, figure S2).In the Pacific Ocean region, marine populations (RABBIT-SLOUGH, RUS-AN, RUS-ASH, RUS-KHA) have low levels of genetic differentiation, while freshwater populations showed higher levels of differentiation (electronic supplementary material, figure S2).In Northern Europe populations showed low pair-wise levels of differentiation independently of the habitat type (electronic supplementary material, figure S2).Within the Adriatic Sea and Iberian Peninsula populations showed the highest levels of pairwise genetic differentiation (electronic supplementary material, figure S2).
The PCA identified three clusters along the 1st PC axis: a European cluster, a Pacific marine cluster (RABBIT-SLOUGH, RUS-AN, RUS-ASH and RUS-KHA) and a Pacific freshwater cluster (BEAR-PAW-LAKE and BOOT-LAKE; figure 1c).Analysis excluding the Pacific samples distinguished six less divergent clusters: three Iberian Peninsula clusters, two Adriatic Sea clusters and a northern European cluster (figure 1d).The most separated cluster along the 1 st PC axis consisted of the Iberian Peninsula populations IP-MI and IP-SA (figure 1d).One other cluster is exclusively formed by the IP-TE Iberian Peninsula population and another cluster by the rest of the Iberian Peninsula populations (IP-LI, IP-RA, IP-VO).The 2nd PC separated the Adriatic Sea and Northern European populations (figure 1c).The Adriatic Sea clusters that could be distinguished were formed by AD-MON, AD-KR, AD-ITA and AD-M populations and by AD-BOS, AD-NB and AD-NN populations (figure 1d).These latter populations are geographically proximate, while the former ones are genetically differentiated from each other.

(b) Demographic history
Most freshwater populations from southern Europe showed marked declines in Ne within the past 10 000 years, suggestive of either a recent colonisation of freshwater habitats or a recent cessation of gene flow with marine populations (electronic supplementary material, figure S4).A few showed evidence of strong bottlenecks followed by population growth (electronic supplementary material, figure S4).

(c) Genetic diversity (i) ROH analyses
Marine individuals had low values of both N ROH and S ROH , whereas most freshwater individuals tended to have higher N ROH and S ROH values (figure 2a).The highest N ROH and S ROH values were found from the Adriatic Sea (KR in particular), Iberian and Alaskan freshwater populations (figure 2a).
royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 291: 20232617 Hence, π was higher in marine than in freshwater populations and increased with latitude whereas F ROH showed the opposite patterns (figure 2b,c).An ANOVA showed that π differed significantly among different geographic regions (F 7,24 = 11.6,p = 2.32 × 10 −06 ), and post-hoc analyses indicated that the Adriatic Sea populations had lower π compared to other regions (except for Iberian Peninsula and Mur River populations, electronic supplementary material, figure S3).Among northern populations, the only significant difference in π was found between Eastern Russia and Fennoscandian populations, the latter having lower π (electronic supplementary material, figure S3).

(d) Genetic parallelism
The results from both the projected (figure 3) and non-projected (electronic supplementary material, figure S5) QDA show that most of the LD clusters present high accuracy of the model independently of the approach.Clusters 25 was removed from analyses as accuracy estimated from cross-validation was <0.75.Clusters 11b and 20 were removed only from the projected QDA analyses, as they showed low accuracy in the QDA from projected data but not for the unprojected.For most of the clusters, southern European freshwater individuals had PFGs approaching 1 more often than individuals from the northern European freshwater populations (figure 3 and electronic supplementary material, figure S5), with Iberian Peninsula populations showing the strongest patterns of parallelism.These patterns are also clearly apparent from the PC1 scores obtained from PCA from the projected and non-projected PCAs (electronic supplementary material, figures S6 and S7).The Parallelism Indices differed significantly among regions, with freshwater populations from the Iberian Peninsula having the highest and the Northern European populations having the lowest (figure 4; ANOVA: F 2,35 = 35.78,p < 0.001) Visualising PCAs (cores of the 1st PC) of the LD clusters of global parallelism shows a similar patterns for both approaches, with marine populations grouping together as expected, and a tendency for a more pronounced grouping among Pacific (Alaska and East Russia) than among northern European marine populations (electronic supplementary material, figure S6 and S7).Independently of the PCA approach, for almost all LD clusters, Adriatic Sea and Iberian Peninsula freshwater populations tended to group at the opposite end of the axis than the marine populations and closely with freshwater populations from the Eastern Pacific.In contrast, the northern European freshwater populations showed a more heterogeneous pattern (electronic supplementary material, figure S6 and S7).In other words, Adriatic Sea and Iberian Peninsula freshwater populations showed more genetic parallelism than northern European freshwater populations.

Discussion
Analyses of nucleotide diversity and runs of homozygosity indicate that in contrast to northern European populations, the southern European populations have lost genetic diversity due to population bottlenecks and inbreeding and that they have    royalsocietypublishing.org/journal/rspbProc.R. Soc.B 291: 20232617 experienced stronger drift resulting in higher degree of genetic differentiation.However, contrary to what we expected, southern European freshwater populations exhibited a higher degree of genetic parallelism than those from northern Europe.This suggests that the loss of genetic diversity in southern populations likely occurred after they had adapted to freshwater environments.Counterintuitively, freshwater populations in northern Europe, which have high genetic diversity and are connected to marine populations, show the lowest levels of parallelism.Below, we discuss these findings and evaluate the hypotheses that could explain these somewhat surprising results.The observed latitudinal trend in genetic diversity is the exact opposite of what is generally expected from a recolonization of higher latitudes from southern refugia [46,47].This is however understandable in light of the fact that many three-spined stickleback populations in the south have experienced long-term isolation [27,40,69], as indicated by our finding of higher population structure in the south than in the north.Furthermore, ROH analyses indicated that population bottlenecks and inbreeding have contributed to the lower genetic diversity in the south.For instance, the Adriatic Sea AD-KR population showed clear signs of being both bottlenecked and consanguineous [61].While the other southern populations did not show such clear patterns, some other Adriatic Sea populations (AD-NB and AD-BOS) displayed patterns consistent with population bottlenecks and the Iberian population IP-SA showed indications of consanguinity [61].
Assuming that the likely isolation and consequent loss of access to SGV harboured by marine populations reduces the likelihood of parallel evolution (and in general of local adaptation), we expected southern European populations to display reduced levels of genetic parallelism when compared to northern European freshwater populations, which are still in contact with marine populations, and therefore, have access to their reservoir of SGV.However, our results contradict this hypothesis: for most LD clusters of global parallelism individuals from southern Europe were more likely to harbour freshwater-adapted genotypes than their northern European counterparts, and showed higher parallelism index.
This suggests that the loss of genetic diversity that characterized populations from the Iberian Peninsula and the Mediterranean basin is the result of a process that started after these populations invaded and adapted to the freshwater environment.We further suggest that the lower degree of genetic parallelism in northern than in southern Europe could be explained by differences in ongoing gene flow between northern and southern European populations.Contact between marine and freshwater populations in northern Europe provides freshwater populations with adaptive variation from a common pool of SGV, but strong gene flow can limit local adaptation by counteracting allele frequency changes driven by selection [20].Such gene swamping could explain the reduced level of genetic parallelism in northern freshwater populations [70,71].Furthermore, Magalhaes et al. [72] showed that there is an association between the extent of genetic parallelism and environmental similarities between freshwater habitats.It is therefore possible that differences in environmental homogeneity in southern and northern European freshwater populations may play a role in shaping the observed patterns of genetic parallelism.Differences in age of southern (old) and northern (young) European populations are unlikely to explain the differences in the degree of parallelism, as local adaptation from SGV can be established very rapidly [17,37,73].
For some of the LD clusters of global parallelism, there were no differences between northern European marine and freshwater populations, supporting the notion that global parallelism in three-spined stickleback populations is not as pervasive as previously thought [3], especially when there is marine-freshwater gene flow.To further confirm these results, other regions known to be associated with marine-freshwater adaptation but not exhibiting high LD (e.g.chrVII [23]) could be studied in the future.On the other hand, although Alaskan marine and freshwater populations are still in contact, they displayed more distinct patterns of freshwater parallelism than northern European marine and freshwater populations.There are at least two non-mutually exclusive explanations for this: (1) there is less gene flow between marine and freshwater populations in the Pacific than in northern Europe, and (2) Pacific freshwater populations are the result of ancient differentiation followed by secondary contact, and thus have not been independently colonized as in the rest of its distribution range [3,74].
Looking specifically at the LD clusters representing chromosomal inversions, individuals from populations in the Iberian Peninsula tended to have the haplotype associated with freshwater adaptation.In contrast, freshwater populations in northern Europe were polymorphic and many individuals carried the marine haplotype.These patterns suggest that the strength of gene flow between marine and freshwater populations may play an important role in determining the predominant inversion variant in freshwater populations.In a scenario with two alleles and antagonistic environmental effects, gene swamping is expected when m/s > α/|1−α| (where m is the migration rate, s the selection coefficient and α the antagonistic environmental effect [20]).In southern Europe, the lack of contact between freshwater and marine populations could have facilitated selection of freshwater variants for some inversions.In contrast, in northern Europe high migration rates may be limiting the efficiency of selection.We also observed some heterogeneity in southern European freshwater populations, with some populations showing higher frequencies of freshwater adapted genotypes than others.This could be due to either geographic heterogeneity in the strength of selection or the stochastic loss of freshwater-adapted karyotypes during bottlenecks at the time of colonization.The first hypothesis seems more plausible in the case of populations with the different karyotype variants for the same inversion, while the second in the case of populations with only one inversion variant.
In conclusion, the results show that the southern European stickleback populations have reduced levels of genetic diversity and generally exhibit a higher level of genetic parallelism than their northern European contemporaries.This suggests that the high degree of genetic parallelism in the south is likely the result of the early colonization history of these populations, and that the loss of diversity occurred subsequently after the ancestral marine population was eradicated.We further suggest that the lower degree of genetic parallelism in northern than southern European populations could be a result of gene swamping, explaining the similarities between northern European marine and freshwater populations both in the genetic diversity and parallelism analyses.
Ethics.This work did not require ethical approval from a human subject or animal welfare committee.All samples were collected during the local breeding seasons using mist-nets, minnow traps or electric fishing, under appropriate required national permits granted to collectors listed in the acknowledgements.In Finland, sampling was done with the approval of land owners, in accordance with the Finnish Fishing Law (5Â § 27.5.2011/600).Samples from Portugal were collected under permits from the Instituto Instituto da Conservação da Natureza e das Florestas royalsocietypublishing.org/journal/rspb Proc.R. Soc.B 291: 20232617

Figure 1 .
Figure 1.Location of samples and PCA (a) World map showing the location of the populations analysed and their habitat type.(b) Location of European samples.(c) PCA performed using genome-wide SNPs of all samples clusters all European populations together and shows two clusters in the Pacific, one corresponding to marine and the other to freshwater populations.(d ) PCA of European samples shows three clusters for Iberian Peninsula populations, one cluster for all northern European populations and one large cluster that could be divided into two for the Adriatic Sea populations.Samples are coloured according to geographical region.Filled and empty circles represent freshwater and marine populations, respectively.

Figure 2 .
Figure 2. N ROH versus S ROH and nucleotide diversity and mean F ROH as a function of latitude.(a) The distribution of N ROH against S ROH shows that the Adriatic Sea, Iberian Peninsula and Pacific populations have experienced demographic phenomena that have increased the homozygosity.(b) Mean nucleotide diversity and (c) mean F ROH as a function of latitude.Samples are coloured according to the geographic clusters used for the analyses.Filled and empty circles represent freshwater and marine populations, respectively.

Figure 3 .Figure 4 .
Figure 3. Quadratic discriminant analysis (QDA) of LD-clusters of global parallelism, based on projected PCs.QDA showing the model with the best accuracy for each of the LD clusters of global parallelism.The accuracy and the number of PCs used for the models in each of the clusters can be seen in the x axis.Percentages show the proportion of individuals from each population that are predicted to have the freshwater genotype.For the populations with no percentage, PGF approaches either 0 or 1 for all individuals.Samples are coloured according to geographic location and shaped according to habitat type.Dark blue labels represent marine populations, sky blue labels freshwater populations.

Table 1 .
Correlation between habitat type and latitude with genetic diversity estimates at population level.Given are the regression coefficients and associated p values.